Logistic Methods for Resource Selection Functions and Presence-Only Species Distribution Models

نویسندگان

  • Steven Phillips
  • Jane Elith
چکیده

In order to better protect and conserve biodiversity, ecologists use machine learning and statistics to understand how species respond to their environment and to predict how they will respond to future climate change, habitat loss and other threats. A fundamental modeling task is to estimate the probability that a given species is present in (or uses) a site, conditional on environmental variables such as precipitation and temperature. For a limited number of species, survey data consisting of both presence and absence records are available, and can be used to fit a variety of conventional classification and regression models. For most species, however, the available data consist only of occurrence records — locations where the species has been observed. In two closely-related but separate bodies of ecological literature, diverse special-purpose models have been developed that contrast occurrence data with a random sample of available environmental conditions. The most widespread statistical approaches involve either fitting an exponential model of species’ conditional probability of presence, or fitting a naive logistic model in which the random sample of available conditions is treated as absence data; both approaches have well-known drawbacks, and do not necessarily produce valid probabilities. After summarizing existing methods, we overcome their drawbacks by introducing a new scaled binomial loss function for estimating an underlying logistic model of species presence/absence. Like the ExpectationMaximization approach of Ward et al. and the method of Steinberg and Cardell, our approach requires an estimate of population prevalence, Pr(y = 1), since prevalence is not identifiable from occurrence data alone. In contrast to the latter two methods, our loss function is straightforward to integrate into a variety of existing modeling frameworks such as generalized linear and additive models and boosted regression trees. We also demonstrate that approaches by Lele and Keim and by Lancaster and Imbens that surmount the identifiability issue by making parametric data assumptions do not typically produce valid probability estimates. Copyright c © 2011, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Introduction We study a modeling task that is central to two related but largely separate bodies of ecological literature. Ecologists investigating resource selection by animals seek to characterize those areas within a region of interest that are “used” by a particular species or individual animals (Manly et al. 2002), while ecologists studying a broad range of sessile or mobile species wish to predict the suitability of sites for occupation or persistence of the species (Franklin 2010). In both cases, the available data frequently consist of a collection of geographic locations with evidence of use by (or presence of) the species together with data on environmental covariates in the region of interest, termed available (or background) data. The most desirable output is the probability of use (resp., probability of presence) conditional on environmental covariates; the shape of the response to the covariates is also important for understanding how the species relates to its environment. Methods for estimating probability of use/presence and related indices are important – they have been used extensively for a variety of applications in ecology and conservation, and according to Google Scholar, a seminal resource selection text (Manly et al. 2002) has been cited 1237 times while an influential SDM paper (Elith et al. 2006) has received 970 citations. While there is shared agonizing over data and model interpretation in the two bodies of ecological research — what defines “presence” rather than a transitory or chance visit, how to determine absolute “absence”, what defines the background area from which the species is selecting sites, what ecological insight can be derived from model outputs (Lele and Keim 2006; Pulliam 2000; Desrochers et al. 2010; Johnson et al. 2006; Franklin 2010) — we focus here on the underlying statistical questions rather than ecological interpretation. In particular, we study maximum-likelihood logistic models of probability of presence and introduce a new method for estimating such models. For brevity, we will use only the “presence–background” terminology from here on. In practice, exponential models are most often used, fitted using logistic regression (Manly et al. 2002) or maximum entropy (Phillips, Anderson, and Schapire 2006), in both cases providing a maximum-likelihood estimate of relative probability of presence. Exponential models have the drawback of being unbounded above, so that when estimates are scaled to estimate true (rather than relative) probability of Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing Discriminant Analysis, Ecological Niche Factor Analysis and Logistic Regression Methods for Geographic Distribution Modelling of Eurotia ceratoides (L.) C. A. Mey

Eurotia ceratoides (L.) C. A. Mey is an important plant species in semi-arid landsin Iran. New approaches are required to determine the distribution of this plant species. Forthis reason, geographical distributions of Eurotia ceratoides were assessed using threedifferent models including: Multiple Discriminant Analysis (MDA), Ecological Niche FactorAnalysis (ENFA) and Logistic Regression (LR). ...

متن کامل

Predicting the Distribution of Leucanthemum Vulgare Lam. Using Logistic Regression in Fandoghlou Rangelands of Ardabil Province, Iran

Species Distribution Modelling (SDM) is an important tool for conservation planning and resource management. Invasive species represent a good opportunity to evaluate SDMs predictive accuracy with independent data as their invasive range can expand quickly. Thus, the aim of this study was to investigate the relationships between presence of Leucanthemum vulgare Lam. and environmental v...

متن کامل

Logistic methods for resource selection functions and presence-only species distribution models

In order to better protect and conserve biodiversity, ecologists are making increasing use of machine learning and statistical modeling to understand how species respond to their environment and to predict how they will respond to future climate change, habitat loss and other threats. A fundamental modeling task is to estimate the conditional probability that a given species is present in (or u...

متن کامل

Predicting the distribution of plant species using logistic regression (Case study: Garizat rangelands of Yazd province)

The aim of this research was to study the relationships between presence of plant species and environmental factors in Garizat rangelands of Yazd province and providing their predictive habitat models. After delimitation of the study area, sampling was performed using randomized-systematic method. Accordingly, vegetation data including presence and cover percentage were determined in each quadr...

متن کامل

Height and Crown Area Distribution of Cionura erecta Shrub lands in chaharmahal and Bakhtiari Province, Using Probability Distribution Functions

Importance of probability distribution functions in natural resource studies is increasing due to their effective roles in better understanding of vegetation structure and providing conceptual models of quantitative indices of plant species. The present study was performed to model the distribution of height and canopy area of Cionura erecta L. shrub, using probability distribution functions in...

متن کامل

Considering ecological dynamics in resource selection functions.

1. Describing distribution and abundance is requisite to exploring interactions between organisms and their environment. Recently, the resource selection function (RSF) has emerged to replace many of the statistical procedures used to quantify resource selection by animals. 2. A RSF is defined by characteristics measured on resource units such that its value for a unit is proportional to the pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011